Learning Recurrent Neural Networks with Hessian-Free Optimization
نویسندگان
چکیده
In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train RNNs on two sets of challenging problems. First, a collection of pathological synthetic datasets which are known to be impossible for standard optimization approaches (due to their extremely long-term dependencies), and second, on three natural and highly complex real-world sequence datasets where we find that our method significantly outperforms the previous state-of-theart method for training neural sequence models: the Long Short-term Memory approach of Hochreiter and Schmidhuber (1997). Additionally, we offer a new interpretation of the generalized Gauss-Newton matrix of sch (2002) which is used within the HF approach of Martens.
منابع مشابه
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
2 Details of the pathological synthetic problems 3 2.1 The addition, multiplication, and XOR problem . . . . . . . . . . . . 3 2.2 The temporal order problem . . . . . . . . . . . . . . . . . . . . . . 4 2.3 The 3-bit temporal order problem . . . . . . . . . . . . . . . . . . . . 4 2.4 The random permutation problem . . . . . . . . . . . . . . . . . . . 4 2.5 Noiseless memorization . . . . . . ...
متن کاملHessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks
Multidimensional recurrent neural networks (MDRNNs) have shown a remarkable performance in the area of speech and handwriting recognition. The performance of an MDRNN is improved by further increasing its depth, and the difficulty of learning the deeper network is overcome by using Hessian-free (HF) optimization. Given that connectionist temporal classification (CTC) is utilized as an objective...
متن کاملTraining Neural Networks with Stochastic Hessian-Free Optimization
Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks. HF uses the conjugate gradient algorithm to construct update directions through curvature-vector products that can be computed on the same order of time as gradients. In this paper we exploit this property and study stochastic HF with gradient and curvature mini-batches independent o...
متن کاملOn the Efficiency of Recurrent Neural Network Optimization Algorithms
This study compares the sequential and parallel efficiency of training Recurrent Neural Networks (RNNs) with Hessian-free optimization versus a gradient descent variant. Experiments are performed using the long short term memory (LSTM) architecture and the newly proposed multiplicative LSTM (mLSTM) architecture. Results demonstrate a number of insights into these architectures and optimization ...
متن کاملOn the importance of initialization and momentum in deep learning
Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011